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ABSTRACT: Obtaining exact depth from binocular disparities is hard if camera 
calibration is needed. We will show that qualitative information can be obtained from 
stereo disparities with little computation, and without prior knowledge (or computation) 
of camera parameters. First, we derive two expressions that order all matched points in 
the images by depth in two distinct ways from image coordinates only. Using one for 
tilt estimation and point separation (in depth) demonstrates some anomalies observed in 
psychophysical experiments, most notably the "induced size effect". We apply the same 
approach to detect qualitative changes in the curvature of a contour on the surface of an 
object, with either x- or ^-coordinate fixed. Second, we develop an algorithm to compute 
axes of zero- curvature from disparities alone. The algorithm is shown to be quite robust 
against violations of its basic assumptions for synthetic data with relatively large controlled 
deviations. It performs almost as well on real images, as demonstrated on an image of four 
cans at different orientations. 
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Introduction 

Research in early vision regarding stereo seems to be concerned mainly with the correspon- 
dence problem, namely, matching points in the left and the right images. Obtaining exact 
depth values from a stereo pair has been considered a simple exercise whose solution is well 
known, although it might involve some tedious computations. Thus, it has been implicitly 
assumed that the final goal of stereo algorithms is to compute an exact depth map using 
disparity values. The following observations suggest, however, that depth computation from 
disparity values is not necessarily straightforward or even feasible, and that more qualitative 
depth information may be more robust and easier to obtain. 

First, the depth computation problem reduces to simple trigonometry when the pa- 
rameters of the cameras, or the eyes, are known. When they are not known, a scheme 
to compute the camera's parameters from a number of conjugate points (that is, matched 
pairs of points from the different images) has been devised, involving the solution of a set 
of nonlinear equations (see for instance [6]). Since the problem has no closed-form solution, 
and since the data are usually imprecise, a solution is found using iterative methods that 
minimize squared error. In practice, however, this approach is difficult to implement. The 
parameters of the cameras must be obtained from data that have errors comparable to the 
magnitude of the disparity values, which are the raw material used for depth computation 
(e.g., error due to pixel quantization). In other words, the registration problem (namely, 
finding parameters for the camera's calibration) is much more difficult than the computation 
of depth from disparity values. Less general methods to perform camera calibration have 
also been devised, see [14] and [7]. 

The other observation originates from biological vision. It seems that human vision 
does not necessarily obtain exact depth values from stereo disparity information alone, 
see, e.g., [5]. Rather, stereo disparity seems to be used mainly in obtaining qualitative 
depth information about objects in the field of view. The estimation of the magnitude of 
this relative depth is possibly dependent on an independent estimation of some physical 
parameters like the angle of convergence of the eyes. 

In view of that, we will show that qualitative relative depth information (ordering) of 
various kinds can be obtained from only conjugate points in two stereo images easily and 
reliably, involving almost no computations and independently of the camera's parameters. 
These orderings will demonstrate some anomalies that are observed in human psychophysics 
and presently lack other straightforward explanations, most notably the "induced size ef- 
fect". We will further show that some qualitative shape information can be obtained from 
image coordinates only. First, one can detect qualitative changes in the curvature of a con- 
tour on the surface of any object in the field of view, with either x- or y-coordinate fixed. 
Similarly, we estimate axes of zero-curvature for objects in the image from disparities alone. 
This algorithm is tested on synthetic and real data to check robustness against violations of 
the basic assumptions of the computation and the existence of noise. We then analyze the 



dependence of errors due to quantization on parameters such as proximity to the axes and 
the angle of convergence. 

Basic Geometry 

Given two cameras, assume that the principal rays intersect at a fixation point. Also, 
assume that the epipolar plane of the fixation point (the plane through the principal rays 
of the cameras, henceforth "base plane") includes the X-axes of both cameras (which are, 
therefore, epipolar lines by definition). Thus rotation about the principal rays of the cameras 
is fixed. We will use the following coordinate system (see figure 1): let the fixation point be 
the origin, the base plane (which passes through this point) be the X — Z plane, and the 
line perpendicular to this plane through the origin be the Y-axis. On the X — Z plane, the 
principal rays of both cameras intersect at the origin and create an angle 2fi between them. 
Let the Z-axis be the angle-bisector of 2//, and the X-axis perpendicular to the Z-axis. This 
system is closely related to the Cyclopean coordinate system used in the literature in which 
the angle-bisector is replaced by the median to the baseline of the cameras and the origin 
is translated to the midpoint of the baseline. A similar system can be defined for motion if 
the fixation point is kept constant, that is, the cameras follow a single object. This is more 
typical of human vision than machine vision. 

For a given point P let a denote the angle of tilt and f3 denote the angle of slant (see 
figure 1). Thus the Cartesian representation of P is (jshjj tan/3' z )> wnere z 1S ^ s depth 
relative to the fixation point in the above coordinate system. Let (xi,yi) and (x r ,y r ) be 
the Cartesian coordinates of the projection of P on the left and right images respectively 
(see lower part of figure 1). Using polar coordinates, the two projections can be written as 
(iE;,i?j) and (P r ,$ r ) respectively. Let A = ^° t t ^ r . Then the following can be shown to hold 
(see appendix A): 

1 A ~l 
tana= ■- (1) 

tan/x A + l v ' 

cot 1? r - cot 1?j 

tanp= — : (2) 

y 2 sin/n v ' 

Thus, the two angles a and /? (of P) depend only on the angle of convergence and the 
polar angle of the projection (P') on each image. It can be shown that the polar angles are 
preserved under projection, through any point on the principal ray, onto either a spherical 
body (like the eye) or a planar one (a camera). There is no dependence on other parameters 
of the cameras, their relative positions or the angle of gaze v. 

Qualitative Depth 

We will use equation (1) of the previous section to obtain an ordering on all matched image 
points according to their tilt and separately for the left and right halves of the visual field. 
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Figure 1. Above, the 3D coordinate system defined by two cameras. Below, the image 
plane of the right camera. 



This ordering is independent of the camera's parameters and demonstrates psychophysical 
anomalies like the induced effect and others. We will use equation (2) to obtain an expres- 



sion for the relative depth z. However, this expression will depend on the values of the 
focal length, the distance between the cameras, and the angle of gaze. An approximate 
parameter-independent relative depth ordering will then be obtained from this expression 
for small angles of convergence 2/j. With additional assumptions and computations, the 
exact coordinates can be computed (see appendix B). 

Tilt-related Order 

From (1) it immediately follows that a is monotonically increasing with A for a fixed con- 
figuration of the cameras. Thus A defines a mathematical ordering of the matched points 
in each side of the Y-axis according to their tilt. The ordering defined by A agrees with the 
relative depth ordering when comparing points that lie approximately on the same line of 
sight from the viewer, namely, about the same image x-coordinate. 
Note that 

In A = In ^^ = In ^P- = (lnx r - lnx,) - (In y, - lny,) = A(lnx) - A(lny). (3) 
cottfj y r /yi 

In other words, if any matching algorithm is applied to the output images of the transfor- 
mation T : (x,y) — ► (In a:, In y) performed on the original images, and the disparity vector 
(A x ,A y ) is then computed in the usual way, then the difference A x - A y = In A defines a 
similar ordering as A. 

One prediction of using A is the "induced effect", the psychophysical effect where a 
distortion of one image by stretching the Y-axis (the vertical axis) of that image produces 
a tilt impression similar to that produced by stretching the X-axis (the horizontal axis) of 
the other image by the same amount (see figure 2). Whereas the tilt impression caused 
by stretching the X-axis has a simple geometrical explanation, the reversed tilt impression 
caused by stretching the Y-axis has none, and has therefore been called an induced effect. 
Induced, since it is as if the unrealistically magnified Y-axis of one image induces the 
shrinkage of the X- and Y-axis of that image as a compensation. This effect, first reported 
by Ogle ([13]), has stimulated extensive research, see [1], [2], [8], [10], [11], and [16]. 

Estimating tilt by A gives similar misperception since A involves only terms of the form 
%-. Hence multiplying the Y-axis by some number has the same effect as multiplying the 
X-axis by its inverse. Thus an induced effect is simply a side-effect of using the expression 
^° t t ^ r . This explanation does not depend on any assumptions and approximations, or the 
complete recovery of all depth- related parameters of the scene. Other researchers ([10] and 
[11]) explain the induced effect as a by-product of a specific approximation scheme and a 
tedious numerical computation; it does not result from an exact solution of the disparity 
equations. Another computational explanation ([2]) suggests that a distortion occurs in the 
matching stage, assuming matching is done along horizontal lines only. 

Motion also shows an illusion similar to the induced effect (see [16]). In this case, 
observers reported that a fronto-parallel plane (a plane parallel to the X— Y plane) appeared 
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Figure 2. An illustration of the induced-effect. Above - the tilt impression caused by 
stretching the X-axis of the right image (correct perception). Below - the tilt impres- 
sion caused by stretching the Y-axis of the right image (wrong perception). The same 
perception is obtained if the X-axis of the left image is stretched by the same amount. 

to be tilting in depth with the right-hand side apparently closer than the left when the 
monocular image was progressively magnified with head movement to the right and vice 
versa. A can account for this phenomenon. Moreover, in this case there is an additional 
effect - a perceived forward/backward motion. This could possibly be accounted for by the 
angle of gaze v. As will be shown later, 

1 l-*t 



XzHiV 



tanu 1 + *=■ 



yi 



Thus, a distortion of the j/-axis in one image will distort v (a distortion of the x-axis will 
affect v much less). The angle of gaze v can give the direction of motion as demonstrated in 
the following example (figure 3): in motion from point 1 to point 2 v is (the true angle of 
gaze). Positive u, the computed angle of gaze, implies motion from 1 to 2', that is, backward 
movement of the head in addition to its left to right movement. Since the head only rotates, 
the object is perceived as moving backwards. 

Quantitatively, since computing A involves computing ratios of the images ^-coordinates 




Figure 3. False positive angle of gaze induces the perception of backward motion (point 
2'), whereas the true value shows no such motion (point 2), see text. 



and ^-coordinates, one should expect computational problems near the X- and Y- axes. It 
is interesting to note that human performance also deteriorates near the axes, especially 
near the horizontal one ([1]). This deterioration is accounted for by a smaller probability for 
the correct detection of the tilt of an oblique line when either the x- or y-axis is magnified, 
and when the angle between the oblique line and the horizontal axis is around either 0° or 
90°. 



Depth-related Order 



From equation (2) one can obtain an explicit expression for the depth z of a point relative 
to the fixation point (the origin). First, note that (2) implies 

y 



z = (cot $ r — cot$i) 



(4) 



2sin/ii 

Thus, x y — ( c °t ti r — cot ■di) defines a relative depth ordering on all the points in space with 
some constant height y over the base plane. It is shown in appendix A that 

-u) 



_ j I cos(fi — V 
\ sin 2/j, 

Substituting (5) in (4) gives 



x sin /j, + z cos /j, 



X El 



(5) 



z = 



I cos(fi — v)/ sin 1\i 



{r^^ + tan^r + ^jfl-coB/i}* 



For an angle of convergence 2/j small enough so that 2h > | tan/j(a: r + ^-zj)l we obtain a 
relative depth ordering on all the points in the visual field by using 

Vr 

X = x r xi. 

yi 

Note that if ft is known, the exact depth can be computed up to a scaling factor. 

As will be shown in the next section, ^ = 1 + O(fi). Likewise, since the field of view 
is bounded by some solid angle 2£ < 180°, it follows that x < /itan£. Thus, a sufficient 
condition for x to g* ve a correct relative depth ordering is 1 > tan/x • tan£. If 2£ < 90°, 
which is a generous upper bound for most cameras, it is sufficient if 1 > tan /J, or 2/j, < 90°. 

We have obtained a relative depth ordering by an expression close to the x-disparity 
(corrected for convergence and non zero vergence, that is, nonzero angle of gaze). However, 
for a fixed convergence angle 2(J,, this ordering suffers some distortion compared to the 
true relative depth ordering, which increases with the horizontal distance from the point of 
fixation (the x-coordinate). 

Approximation of Vertical Disparities 

From the definition of A and x it follows that the base plane itself is singular in the sense 
that these orders are not defined for points on it. One can, however, estimate the orders 
by substituting %*■ of a matched point far from the base plane. More specifically, for P = 
(x,y,z) we have 

y r di z 2 sin fi tan v x 2sin^ x z 

yi d r d r 1 + tan /x tan v d r 1 + tan /x tan u d r ' d r 

2tan/^tani/ z 2sin//tani' x 2 sin n ,x_ z . 

l + tan//tani> d r 1 + tan fi tan v d r 1 + tan \l tan v d r d T 

Thus, if point P J is used to approximate point P\ the error will be: 

3 2 sin /z 



ft) - © ; 



z* - z } x l 

tani> + 



CLf CLf 



1 + tan fi tan v 

The error is especially small when the approximating point P 3 lies exactly "above" P % 
(differs only in the y-coordinate). 

One can use as an approximation xltanutan!^ ( tne ^ rst two terms )> so tn at some (pos- 
sibly independent) estimate of n (half the angle of convergence) and v (the angle of gaze) 
will suffice to assign a rough value to **■ when no other source of information is available. 
Note that one cannot take %f- fts 1 when computing x r — ^xj, as a first order approximation 
in fi, since x r — x\ is of the order of magnitude of \i also. 

Support from Psychophysical Results 

The orders A and x as defined above and M>' dependency of the scaling coefficients on 
camera parameters seem to be consistent with i he following psychophysical results: 



1. Relative depth perception in human vision seems to be more reliable than absolute depth 
perception. That is, the distinction between different objects at different depth is much 
more accurate than the estimation of their absolute depth (with no additional information 
of perspective). 

2. The induced effect, as discussed above, is shown to be a side effect of using the tilt- 
related order A to estimate the tilt of a plane at the fixation-point. No assumptions on 
the way the visual system finds and interprets corresponding points is needed. Moreover, 
this is a "local" explanation of the induced effect in the sense that it allows for opposite 
induced effects in neighboring spatial regions, in agreement with psychophysical evidence 
(see [16]). Likewise, this explanation does not imply a perceived asymmetric convergence 
of the eyes, again in agreement with empirical data. It is interesting to note that A 
might be the discrete equivalent to the term of the optical flow field used by Rogers and 
Koenderink ([16]) to explain the induced effect in motion parallax. Quantitatively, A is 
more susceptible to errors near the axes, in agreement with psychophysical experiments 
([1]) that show deterioration in human performance of tilt estimation near the axes. 
Such an effect for a plane not passing through the fixation point will be predicted by the 
qualitative shape analysis in the next section. 

3. Comparison of the depth of features on both sides of the Y-axis is less accurate than if 
they are on both sides of the X-axis. This is demonstrated in the following experiment 
([15]). The first stimulus is a surface whose depth (Z) as a function of X is shown in 
figure 4a. Here Z is constant with Y. When comparing the depth of the two edges F\ and 
F r , that are at the same depth, the observer (wrongly) perceives Fi as closer to her than 
F r . In the second stimulus the same configuration is rotated by 90° so that Z changes 
with Y as is shown in figure 4b and is constant with X. Now the observer (rightly) 
perceives Fi and F r as equidistant. This result is consistent with A, which orders points 
on each side of the Y-axis separately, x 1S ^ so defined differently on each side of the 
Y-axis, and it has some distortion as a function of the horizontal distance between the 
two points. 

4. There is psychophysical evidence for the deterioration in depth discrimination when 
points are coplanar with the point of fixation (see [12]). Note that A is constant for 
points coplanar with the fixation point. 

5. There is empirical evidence for the dependence of relative depth perception on external 
perception of the angle of convergence of the eyes. This is consistent with using the 
orders A and x to evaluate depth with no additional computations. 

Moreover, A and \ y depend only on the polar angles of the conjugate points in both 
images, a quantity that is preserved under projection to a spherical body (an eye) or a 
planar body (a camera). It is interesting to note, in this respect, that the first visual 
transformation from the retina to VI in primates seems to be in good agreement with the 
complex-log mapping ([17]), namely: (x, y) — ► (log r, $). This mapping explicitly computes 
the polar angle <d of a point. 




a) 




Figure 4. Anisotropy between the horizontal (X) and vertical (Y) dimensions in human 
depth perception (see text). 

Qualitative Shape and Applications 

We will use the basic relations obtained above to compute some qualitative information on 
the surface of objects. Robustness against noise and violations of the assumptions of the 
basic computations will be tested on synthetic and real data. 

Axes of Zero-Curvature 



Given three different points on an object P , Pi and P 2 , the vector (Pi - P ) X (P - P2) is 
parallel to the normal to the plane that passes through the three points. It vanishes if (and 
only if) Po, P\ and Pi are collinear in space, which is the case for points on a zero-curvature 
axis (and for only such points) by definition. Moreover, collinear points are projected onto 
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a straight line in each image. Thus, an algorithm to find a zero- curvature axis at a point 
Pq on the surface of an object, when a single one exists, is as follows: 

Let 0o be the projection of Pq on one image plane. For each line in this plane, which 
passes through Oq at some angle 9 (the lines are parameterized by 9), select two points on 
line 9 in the image plane, denoted by 0\ and 02- These points are the projections of points 
Pi and Pi that lie on the surface of the same object (this is a condition for the selection of 
0\ and 02). For convenience, select 0\ and 02 such that each lies on a different side of Oq 
on line 0. Next, estimate the expression: 

0=|(P 1 -Po)x(P o -P 2 )|. 
Return the direction 9 that minimizes (ideally, under infinite resolution and precision, 
will be zero). 

This is useful when there is one and only one axis of zero-curvature through each point 
on the object, e.g. cylinders and cones (see [3] for a different approach to this problem). 
It is most useful for cylinders, where the directions of zero-curvature axes are the same at 
each point on the object. 

Had we the exact depth values at each point, we could compute exactly and find the 
direction 9 that minimizes it. To estimate without knowing the exact depth values, recall 
that 

(A = c c °\^ r and x y — ( c °t ^r — cot t?i) as defined above). Let 

0* = (A - p ) x (p - p 2 ) = (e x , e y , e z ). 

The following can be readily verified: 

Q x = 2^i22 sin fi • Q x 



and 





Q y = —z\Z2 tan/i • y 




02 = 2z\ z<i tan fj, sin fi • 


z 2 \X\ 


Xo) \Xi Xl) z\ \xl 



0* = — 

Z2 

= = £o f A! + l _ A + l \ _ / A! + l _ A 2 + l \ £o f Ap + 1 _ A 2 + l 

y 22U1-1 Ao-iy U1-1 A 2 -iy ziVAo-i a 2 -i 

a _zo/^Ai + l 1 Ao + 1 1\ ^Ai + l 1 A 2 + l 1 



J_\ _ / Ai + 1 1_ _ 

x y J Ui-i'xl 



+ 



Z2VA1-I xl Ao-1 xV VAi-1 Xl A2-I xl 

1 



£0 / Ap + 1 1_ _ A 2 + l 
z\ \Ao -1 xl A 2 - 1 



Xo 



For each i,j, Aj and x v depend only on the the polar coordinates of the projections of 
each Pi, and f»- can be approximated by image coordinates to a first order in the angle of 
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convergence /j. That is, 



Zi Xi 






Vr 


— « — , 


X = 


■ x r — 


— Xi 


z i Xj 






Vi 



Thus 0^, y , and Q z can be estimated from image coordinates only. 

As noted above, the estimated axis of zero-curvature at point Oo will be the axis that 
obtains the minimum of = (0 2 + 2 + 2 ) 5 . However, since in practice will probably 
rarely obtain 0, it is possible that an unfortunate choice of 0\ and Oi for some #'s and 
the fact that is not normalized will lead to a bad estimate. Nevertheless, our algorithm 
minimizes the approximate expression = (2Q X ) 2 + (0j,) 2 + {\Q> Z ) 2 • 

We first tested the algorithm on some images of cans at different orientations in space. 
In the example of figure 5, the camera was moved manually to obtain a stereo pair. The 
fixation point (and hence the origin of each image coordinate system) was taken to be the 
center of the right image and the corresponding point in the left image (which in the above 
"bad" example is a few pixels to the left of the center of the left image). We still assume that 
the X — Z plane approximates the base plane (first section). The two 256x256 images have 
been matched using a parallel motion algorithm implemented on the Connection Machine 
([9]), and its output has been smoothed by averaging with a 3x3 window over neighboring 
pixels. In a fixed region at the center of each object, the direction of the zero-curvature 
axis has been estimated using the above algorithm at each pixel. The direction obtained 
by the largest number of pixels in the region was selected as a final estimate. In an image 
containing four cylindrical objects at various orientations, the true axis of zero-curvature 
has been obtained for three (figure 5). A rather good approximation has been obtained 
for the fourth, where the "second best" direction has been selected (we have used a rather 
coarse quantization of directions). The error in the fourth can may possibly be due to the 
fact that this can practically lies on the X-axis, where quantization errors seem to prevail 
(see figure 6). Additional errors may occur if the central region is not chosen appropriately, 
that is, if it lies too close to the boundary of an object or if it covers area with little texture. 

We then used synthetic images of cylinders and cones to test the robustness of the 
algorithm against errors and deviations from its basic assumptions. Table 1 summarizes the 
results for synthetic cylinders and cones defined by 

~ x o? , ~ *o) 2 _ , , (* ~ xp) 2 (z - z ) 2 _ (y-y ) 2 
~~ P + ~~P -land— -5 + b 2 -— ^ 

respectively. In this example we have used a = 4, b — 4, and c — 1. The angle of 
convergence is 30°, the distance between the cameras is 13, and the coordinates of the 
center of the cylinder or cone (x ,y ,zo) are (50,50,20) in the 3-D fixation-point coordinate 
system defined in the first section. However, the algorithm is not sensitive to the exact value 
of either of these parameters. The point of fixation is equidistant to both cameras. 

The objects have been rotated relative to an initial orientation where the main axis 
is parallel to the y-axis by different values of £ (rotation about the z-axis) and £ (rotation 
about the x-axis), see table 1. We checked robustness against deviations from the basic 
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Figure 5. Axes of zero-curvature obtained from "qualitative" shape. Above is the stereo 
pair. Below is the left image where white lines on each object mark the axes of zero- 
curvature found by the algorithm. 
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Table 1. Axes of zero-curvature for synthetic cylinders (above the separation line) and 
cones (below the separation line). See text for the meaning of the different columns. 
The last column gives the error in degrees (the difference between the true axis of zero- 
curvature and the one obtained by the above algorithm.) 



assumptions of the computation by introducing the following errors: (Axj, Aj/j), i€{r,l} are 
misalignments of the points of fixation at the two cameras (in image coordinates, where the 
focal length of the camera is unity); 6 r and Si are the angles (in degrees) between the true 
image X-axis and the X-axis assumed (the line of intersection between the image plane and 
the base-plane in the right and left images respectively). If any of these errors exists, the 
X-axis of one image does not lie (exactly) in the base plane, and/or the cameras are not 
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fixated on exactly the same point. Table 1 summarizes the results. The error column gives 
the difference in degrees between true and estimated axis of zero- curvature in the image 
plane. (Note that in the image plane a zero-curvature axis is defined by a single angle). 



lorizontal meridian 
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Figure 6. The dependence of quantization errors on proximity to the horizontal and the 
vertical axes, scaled by the distance from the cameras to the fixation point. 
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Figure 7. The dependence of quantization error on the angle of convergence of the cameras 
(above) and the amount of quantization (below). 
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Finally, we measured the dependence of quantization errors on variables such as prox- 
imity to the axes, resolution, and the angle of convergence of the cameras. One can see 
(figure 6) that the performance of the algorithm deteriorates substantially as the reference 
point on the object approaches the X-axis. There is much less deterioration near the Y-axis. 
Figure 7 shows the dependence of the error on the angle of convergence of the cameras. Far 
from the axes, the error is small even for substantial quantization, about a hundred times 
coarser than human visual hyperacuity (figure 7, below). The functions plotted in figures 
(6)-(7) measure the average value of the error over some fixed patch of the field of view with 
only one parameter varying. Some smoothing has been applied to these plots (each point 
has been averaged with its two nearest neighbors). 

Ordering of Surface Normals 

For any two points P x and P 2 , where Pi = ziCj^-, j^, 1) and P 2 = Mufe tsbj» *)> 
let N = Pi x Pi. N is perpendicular to (Pi - P2). (It is actually proportional to the normal 
to the plane passing through Pi, P2, and the fixation point.) After some calculations, it can 
be shown that 

)\f _ / COt & — CO * ^ 2 CO * a2 — CO * ai \ 

COt «l COt /?2 — COt «2 COt p\ COt OL\ COt /?2 — COt OJ2 COt f}\ 

= z{ T ^—MX,#l€), -±-g(#l, olJl€), i), 

xaji fi sin fi 

f( *i , 9 i ,02 , 9 2^ _ (cot^-cot^)-(cot^-cot^) 

a( *i „i ,92 ^ _ cot #1 cot 0*- cot ^ cot < 

^ " r ' " r;_ (cot^_coti?i) + (cott? 2 -cot^)' 
Thus, as long as /($*,#*, $ 2 ,$ 2 ) and <7(i?*,$*,t? 2 ,i? 2 ) remain constant, which can be deter- 
mined from image coordinates only, the points are coplanar (among themselves and with 
the fixation point), or the object at the fixation-point is planar. Note that A is obtained 
from / when cot ■d] = cot tf 2 . = {g — then). 

For any object it is possible to obtain qualitative information about its surface along 
any contour, with either x or y fixed. Take a contour on the surface with some fixed y- 
coordinate, and let P x and P 2 be two points on it. Since the y-coordinate of Pi - P 2 is 0, the 
projection of N on the X - Z plane, n = z{j^-zf{ti\ , ^\, #? , $1) , 1), is perpendicular to the 

projection of Pi — P2. Thus, for fixed y, the one dimensional boundary contour is convex 
when /($i,i?*,# 2 ,t? 2 ) increases with increasing x, concave when /(??*, t?*, $ 2 ,^ 2 ) decreases, 
and linear when /(i?|, t?i,i? 2 ,i? 2 ) remains constant. Note that x y can De obtained from / 
since the sign of / determines relative depth between two points with fixed y-coordinate. 
The same qualitative description can be obtained for any boundary contour with fixed x 
from following #(??[, t?*,tf 2 ,$ 2 ) with increasing y. This qualitative description depends only 



where 
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on image coordinates (specifically, on the polar angles of conjugate points). Thus it predicts 
an "induced effect" for planes that do not include the point of fixation. Obtaining this 
description is not straightforward, though, since such contours in the 3D coordinate system 
will be usually mapped to oblique lines in the image plane. 

In the general case we estimate the normal to the plane passing through three points 
Pi, P2, and P3 to a first order in \i and the z-disparity 

x r — Xz-xi 
2/i sin fi 
In this case, after substituting K • (x r - ^xi) as an approximation for z, where K is some 
constant that depends on [i, i>, and h, one gets an expression for the general normal Ng' 

iV G =(P 1 -P 2 )x(P 2 -P 3 ) 

^W(V ■ F(xl,x] ,yl,y} ,x 2 r ,x 2 ,y 2 r ,y 2 ,x 3 r ,x^y 3 r ,yf), 
U ■ G(xl,x] ,yj,yj ,x 2 r ,x 2 ,y 2 r ,y 2 ,x 3 r ,x 3 ,y*,y 3 ), 1), 

where W, V, and U are some constants that depend on n, v and h. F() and G() are some 
functions of images coordinates only. Once again, one can verify approximate planarity of 
surfaces anywhere in the field of view when F() and G() remain constant. 

Summary 

The goal of this work has been to obtain and use qualitative information from a stereo pair, 
with as few computations as possible and with a minimal dependence on the camera and 
scene parameters. First, we have shown that points in a stereo pair, once matched to each 
other, can be ordered in two related ways according to their tilt (A) and their depth (x)- 
These orders are completely determined by image coordinates of conjugate points, and no 
camera or scene parameters are needed. A and some variation of x (x y ) depend only on 
the polar angles of the conjugate points in both images, a quantity that is preserved under 
projection to a spherical body (an eye) or a planar body (a camera). We discussed some 
psychophysical (and neurophysiological) results that can be understood by the use of such 
orders. Most notably, the use of A for tilt estimation predicts "the induced size effect", an 
unusual behavior of the human visual system that lacks other straightforward explanation. 

Some qualitative shape information has then been obtained from stereo disparities. We 
first developed an algorithm to detect axes of zero- curvature on objects (where a single 
such axis exists). This algorithm performed well on synthetic data even when the basic 
assumptions of the computation have been substantially violated. It performed almost as 
well on real images. In one example of an image of four cans, the true axes of zero-curvature 
have been found for three cans, and the "second best" axis has been found for the fourth. 
Finally we showed that one can detect qualitative changes in the curvature of a contour on 
the surface of any object in the field of view, with either x- or y-coordinate fixed. 
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Figure 8. The projection of point P on the image plane. 



Appendix A: Derivation of Some Geometrical Relations 



Figure 8 illustrates the projection of a point in space (P) onto the image plane. From 
similarity of triangles it follows that 

B~P BO DO AO-AB cos ip 



B'P> B'O A'O 



A'O 



B'P' is the image coordinate y, A'O is the focal length h and AO is the distance from the 
fixation point to the camera d. Thus 

h~BP 



In a similar way 



Thus 



d — AB cos <p 

_ hAB sin <p 
d — AB cos tp 

x AB . 

sm <p. 



(6) 



y BP 

The assumption that the base plane intersects both cameras X-axes implies that the same 
geometry holds for both cameras in the sense that thr> segments BP and AB are identical. 
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We add indices / and r for the variables of the left and right cameras respectively. Then 



x r _ AB . xi _ AB 

~ ~ BT' sm(pr ' Ti~W 



Vl 



■ sm (pt, 



and finally 



Vr VI 



4y 



smyv 
sivupi 




\ 



P 

K 



\ 



V 



/ 



Figure 9. The 3D coordinate system with both cameras. 

From figure 9 it follows that 

ip r = a + 90° - fj, , tpi = a + 90° + fi. 



Thus 



From (8) and (9) we get 



sin (p r _ 1 + tan a tan fj, 
sin <pi 1 — tan a tan fi ' 



_ x r xi _ 1 + tan a tan \i 
y r yi 1 — tan a tan /j, 



tan a tan fi — 



A-l 
A + l' 



(7) 
(8) 



(9) 



19 



which gives equation (1). 
Similarly, 

COt fir — COt 1?j = — = (sin <p r — sin (f[) 

y r Vi BP 

= =2 sin a sin £j = 2sin^— • 
BP P ^y 

Since by definition tan/3 = £ we immediately obtain equation (2). 

Finally, considering the right image with no loss of generality, we have from (6), 

_ hBP hy 

d r - AB cos <p r d r - xsin/j, + z cos ji 

From figure 9 one can see that d r = C g*n2 » so ^ a ^ 



,Icos(u — v) . . y r ,_ N 

y = { r-z - a: sin /i + z cos /j) •— . (5) 

sm Zfx n 



Appendix B: Computation of the Exact Depth 

We will compute the exact tilt and depth to a first order in the angle of convergence 2/j,, 
following Mayhew &, Longuet-Higgins' method (in [11]) to compute tilt and slant of a plane 
through the fixation point. The following scheme, however, will be simpler and involve less 
and more rigorous assumptions (we shall only assume small 2fi as implied above). Since, to 
a first order in /j,, tan 2fj, w _£^id ; where R is the distance between the fixation point and 
the midpoint of the interocular line (the nose), our computations will be correct to a first 
order in (^). 

Let (x,y) and (x',y') denote the image coordinates of a certain point in space on 
the two cameras respectively. Let a and /? denote the parameters of a plane that passes 
through a given point in space and the fixation point in the above coordinate system, so 
that Z = &X + 0Y. Thus a is tan(a) in the previous notations if /3 = and vice versa. 
Then, to a first order in /i, we have ([4]) 

Ax = x' — x =[(acos(i/) + sin(f))a: + (5cos{v)y 

+ (cos(i/) - asin(i>))x 2 - $ s'm(i/)xy] ■ I/R, 
Ay — y' ' — y =[sin(j>)y + (cos(z/) - asin(i/))xy 
-Psin(v)y 2 ]-T/R. 

(The coordinate system used to obtain (10) is ihe Cyclopean coordinate system. This, 
however, does not change the results when cliim^ing to our coordinate system since the 
angle-bisector and the median are the same linr to a first order in /j, and the translation of 
the origin has been taken into account in the drli "ition of the target plane.) 
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It is usually possible to consider only the plane passing through a point in space and 

the fixation point and is perpendicular to the base plane (that is, J5 — 0). This plane is 

parametrized only by a. Thus, we have 

Ay 

— = [sm(v) + (cos(v)-asm(v))x]-I/R 

y (li) 

= [tan(i^) + (1 — atan(i/))#] • tan(2/x). 

Let (#1,3/1, A#i , A3/1 ) be the coordinates and disparities of a point on the vertical 
axis, so that x\ fa 0. Then we have 

= tan(^) • tan(2/x). 

3/1 

Let (#2,3/2, A^2 , A3/2 ) be the coordinates of a point with a fa 0. Such a point, if 
it exists, can be easily identified since it satisfies ^- fa ^-. Then we have: 

x<i • tan(2/x) = 
In other words, 



tan(^) • tan(2/x) = 

3/2 3/2 


A3/1 
3/1 


3/2 


3/1 


tan(2 M ) = - • [^ - % 
#2 3/2 3/1 









Now, for any point (#,3/) in the image we have, using (10) with ft — 0: 

#^_ 3/ _ A 

# 3/ a 
This leads to the final equations 



x' y' Ax Ay . 

= = a ■ I cos(v)/R = a • tan(2u). 

# 3/ # 3/ 



tan(2|i)=^-.[^-^], (12) 

#2 3/2 3/1 



and 

& = * y ; tan(i/) = »' . ; R = 



tan(2/i) ' tan(2^) ' ^^(2//) + 



&21. 



j/i 



The ratio ^- near the F-axis is relatively reliable and easy to obtain. However, a point 
with a fa does not necessarily exist, in which case we can solve the initial scheme directly. 
The set of equations remained to be solved is 

tan(2 ^_a.^L = i.(^-^) 

3/1 #3/3/1 (12) 

x v 

tan(2u) • a — , 

x y 

which reduces to a second degree polynomial in a. 
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